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(54) Code-excited linear predictive coder and decoder with conversion filter for converting 
stochastic and impulse excitation signals 



(57) A code-exc'ited linear predictive coder or 
decoder for a speech signal has an adaptive codebook 
(105), a stochastic codebook (106), and a pulse code- 
book (107). A constant excitation signal (ec) is obtained 
by choosing between a stochastic excitation signal (es) 
selected from the stochastic codebook and an impulsive 
excitation signal (ep) selected from the pulse codebook. 
The constant excitation signal is filtered to produce a var- 

FIG 



ied excitation signal more closely resembling the original 
speech signal. The varied excitation signal is combined 
with an adaptive excitation signal (ea) selected from the 
adaptive codebook to produce a final excitation signal 
(e) which is filtered to generate a synthesized speech 
signal. The final excitation signal (e) is also used to 
update the adaptive codebook. 
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Description 

BACKGROUND OF THE INVENTION 

5 The present invention relates to a code-excited linear predictive coder and decoder having features suitable for use 

in. for example, a telephone answering machine. 

Telephone answering machines have generally employed magnetic cassette tape as the medium for recording 

incoming and outgoing messages. Cassette tape offers the advantage of ample recording time, but has the disadvantage 

that the recording and playing apparatus takes up considerable space, and the further disadvantage of being unsuitable 
10 for various desired operations. These operations include selective erasing of messages, monotone playt>ack. and rapidly 

checking through a large number of messages by reproducing only the initial portion of each message, preferably at a 

speed faster than normal speaking speed. 

The disadvantages of cassette tape have led manufacturers to consider the use of semiconductor irrtegrated-ctrcult 

memory (referred to below as IC memory) as a message recording medium. At present. IC memory can be errployed 
75 for recording outgoing greeting messages, but is not useful for recording incoming messages, because of the large 

amount of memory required. For IC memory to become more useful, it must be possible to store more messages in less 

memory space, by recording messages with adequate quality at very low bit rates. 

Linear predictive coding (LPC) is a well-known method of coding speech at low bit rates. An LPC decoder synthesizes 

speech by passing an excitation signal through a filter that mimics the human vocal tract. An LPC coder codes the speech 
20 signal by specifying the filter coefficients, the type of excitation signal, and its power. 

Various types of excitation signals have been used in linear predictive coding. The traditional LPC vocoder, for 

example, generates voiced sounds from a pitch-pulse excitation signal (an isolated impulse repeated at regular intervals). 

and unvoiced sounds from a white-noise excitation signal. This vocoder system does not provide acceptable speech 

quality at very tow bit rates. 

25 Code-exdted linear prediction (CELP) employs excitation signals drawn from a codebook. The CELP coder finds 
the optimum excitation signal by making an exhaustive search of its codetx)ok, then outputs a corresponding Index value. 
The CELP decoder accesses an identical codebook by this index value and reads out the excitation signal. 

More than one codelDook may be employed. One CELP system, for example, has a stochastk: codebook of fixed 
white-noise signals, and an adaptive codebook structured as a shift register. A signal selected from the stochastk: code- 

30 book is mixed with a selected segment of the adaptive codebook to obtain the excitation signal, which is then shifted 
into the adaptive codebook to update its contents. 

CELP coding provides improved speech quality at low bit rates, but at the very low bit rates desired for recording 
messages in an IC memory in a telephone set, CELP speech quality has still proven unsatisfactory. The most strongly 
impulsive and periodic speech waveforms, occuning at the onset of voiced sounds, for example. are not reproduced 

35 adequately. Very low bit rates also tend to create irritating distortions and quantization noise. 

SUMMARY OF THE INVENTION 

The present invention offers an improved CELP system that appears capat)le of overcoming the atxsve problems 
40 associated with very low bit rates, and has features useful in telephone answering machines. 

One object of the invention is to provide a CELP coder and decoder that can reproduce strongly periodic speech 
waveforms satisfactorily, even at low bit rates. 

Another object is to mask the quantization noise that occurs at low bit rates. 
A further object is to reduce distortion at low bit rates. 
45 Yet another object is to provide means of dealing with nuisance calls. 

Still another object is to provide a simple means of varying the playback speed of the reproduced speech signal 
without changing the pitch. 

According to a first aspect of the invention, a CELP coder and decoder for a speech signal each have an adaptive 
codebook, a stochastic codebooK a pulse codebook. and a gain codebook. An adaptive excitation signal, corresponding 
so to an adaptive index, is selected from tiie adaptive codebook. A stochastic excitation signal is selected from the stochastic 
codebook. An impulsive excitation signal is selected from the pulse codebook. A constant excitation signal is selected 
by choosing between the stochastic excitation signal and the impulsive excitation signal. A pair of gain values is selected 
from the gain codebook. 

The constant excitation signal is filtered, using filter coefficients derived from the adaptive index and from linear 
55 predictive coefficients calculated in the coder. The constant excitation signal is thereby converted to a varied excitation 
signal more closely resembling the original speech signal input to the coder. The varied excitation signal and adaptive 
excitation signal are combined according to the selected pair of gain values to produce a final excitation signal. The final 
excitation signal is f Otered. using the above-mentioned linear predictive coefficients, to produce a syrrthesized speech 
signal, and is also used to update the contents of the adaptive codebook. 
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The linear predictive coefficients are obtained in the coder by performing a linear predictive analysis, converting the 
analysis results to line-spectrum-pair coefficients, quantizing and dequantizing the line-spectcum-pair coefficients, and 
reconverting the dequantized line-spectrum-pair coefficients to linear prediction coefficients. 

The speech signal is coded by searching the adaptive, stochastic, pulse, and gain codebooks to find the optimum 
. 5 excitation signals and gain values, which produce a synthesized speech signal most closely resembling the input speech 
signal. The coded speech signal contains the indexes of the optimum excitation signals, the quantized line-spectrum- 
pair coefficients, and a quantized power value. 

According to a second aspect of the Invention, monotone speech is produced the holding the adaptive index fixed 
in the coder, or in the decoder. 

10 According to a third aspect of the invention, the speed of the coded speech signal is controlled by detecting periodicity 
In the input speech signal and deleting or interpolating portions of the input speech signal with lengths corresponding 
to the detected periodicity. 

According to a fourth aspect of the invention, the speed of the synthesized speech signal is controlled by detecting 
periodicity in the final excitation signal and deleting or interpolating portions of the final excitation signal with lengths 
15 corresponding to the detected periodicity. 

According to a fifth aspect of the invention, after the synthesized speech signal has been produced in the decoder, 
a whHe-nolse signal is added to the final reproduced speech signal. 

According to a sixth aspect of the invention, the stochastic codebook and pulse codebook are combined into a single 
codebook. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a first embodiment of the invented CELP coder. 

FIG. 2 is a block diagram of a first embodiment of the invented CELP decoder. 
25 FIG. 3 Is a block diagram of a second embodiment of the Invented CELP coder. 

FIG. 4 Is a block diagram of a second embodiment of the invented CELP decoder. 

FIG. 5 is a block diagram of a third embodiment of the invented CELP coder. 

FIG. 6 is a diagram illustrating deletion of samples to speed up the reproduced speech signal. 

FIG. 7 is a diagram illustrating interpolation of samples to slow down the reproduced speech signal. 
30 FIG. 8 is a block diagram of a third embodiment of the invented CELP decoder. 

FIG. 9 is a block diagram of a fourth embodiment of the invented CELP decoder. 

FIG. 10 is a block diagram illustrating a modification of the excitation circuit in the embodiments above. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

35 

Several embodiments of the invention will now be described with reference to the attached illustrative drawings, and 
features useful in telephone answering machines will be pointed out. 

First coder embodiment 

40 

FIG. 1 shows a first embodiment of the invented CELP coder. The coder receives a digitized speech signal S at an 
input terminal 1 0, and outputs a coded speech signal M, which is stored in an IC memory 20. The digitized speech signal 
S consists of samples of an analog speech signal. The samples are grouped into frames consisting of a certain fixed 
number of samples each. Each frame is divided into subframes consisting of a smaller fixed number of samples. The 

45 coded speech signal M contains index values, coefficient information, and other information pertaining to these frames 
and subframes. The IC memory is disposed in, for example, a telephone set with a message recording function. 

The coder comprises the following main functional circuit blocks: an analysis and quantization circuit 30, which 
receives the input speech signal S and generates a dequantized power value (P) and a set of dequantized linear predictive 
coefficients (aq); an excitation circuit 40. which outputs an excitation signal (e); an optimizing circuit 50. which selects 

so an optimum excitation signal (eo); and an interface circuit 60, which writes power information lo. coefficient information 
Ic, and index information la, Is. Ip. Ig. and Iw in the IC memory 20. 

In the analysis and quantization circuit 30. a linear predictive analyzer 101 performs a forward linear predictive 
analysis on each frame of the input speech signal S to obtain a set of linear predictive coefficients (a). These coefficients 
(a) are passed to a quantizer-dequantizer 1 02 that converts them to a set of llne-spectrum-pair (LSP) coefficients, quan- 

55 tizes the LSP coefficients, using a vector quantization scheme, to obtain the above-mentioned coefficient information 
Ic, then dequantizes this information Ic and converts the result back to linear-predictive coefficients, which are output 
as the dequantized linear predictive coefficients (aq). One set of dequantized linear predictive coefficients (aq) is output 
per frame. 
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A power quantizer 104 in the analysis and quantization circuit 30 confutes the power of each frame of the input 
speech signal S. quantizes the conputed value to obtain the power information lo. then dequantizes this information lo 
to obtain the dequantized power value P. 

The excitation circuit 40 has four codebooks: an adaptive codebook 105. a stochastic codebook 106, a pulse code- 
5 book 107. and a gain codebook 1 08. The excitation circuit 40 also comprises a conversion filter 1 09, a pair of multipliers 
110 and 1 11 . an adder 1 12. and a selector 1 13. 

The adaptive codebook 105 stores a history of the optimum excitation signal (eo) from the present to a certain 
distance back In the past. Like the input speech signal, the excitation signal consists of sample values: the adaptive 
codebook 105 stores the most recent N sample values, where N is a fixed positive integer. The history is updated each 
10 time a new optimum excitation signal is selected. In response to what will be termed an adaptive index la, the adaptive 
codebook 1 05 outputs a segment of this past history to the first multiplier 1 1 0 as an adaptive excitation signal (ea). The 
output segment has a length equal to one subframe. 

The adaptive codebook 105 thus provides an overlapping series of candidate waveforms which can be output as 
the adaptive excitation signal (ea). The adaptive index la specifies the point in the stored history at which the output 
IS waveform starts. The distance from this point to the present point (the most recent sample stored in the adaptive code- 
book 1 05) is termed the pitch lag, as it is related to the periodicity or pitch of the speech signal. The adaptive codebook 
structure will be illustrated later (FIG. 10). 

The stochastic codebook 1 06 stores a plurality of white-noise waveforms. Each waveform Is stored as a separate 
series of sample values, of length equal to one subframe. In response to a stochastic index Is, one off the stored waveforms 
20 is output to the selector 1 13 as a stochastic -excitation signal (es). The waveforms in the stochastic codebook 106 are 
not updated. 

The pulse codebook 107 stores a plurality of impulsive waveforms. Each waveform consists of a single, isolated 
impulse at a position specified by pulse index Ip. Each waveform is stored as a series of sample values, all but one of 
wfhich are zero. The waveform length is equal to one subframe. In response to the pulse index Ip. the corresponding 
25 impulsive waveform is output to tiie selector 1 1 3 as an impulsive excitation signal (ep). The impulsive waveforms in the 
pulse codebook 1 07 are not updated. 

The stochastic and pulse codebooks 106 and 107 preferably both contain tiie same number of waveforms, so that 
the stochastic and pulse indexes is and Ip can efficiently have the same bit lengtii. 

The gain codebook 108 stores a plurality of pairs of gain values, which are output in response to a gain index Ig. 
30 The first gain value (b) in each pair is output to the first multiplier 110, and the second gain value (g) to tiie second 
multiplier 1 12. Before being output, the gain values are scaled according to the dequantized power value P, but tfie pairs 
of gain values stored in the gain codebook 108 are not updated. 

The selector 113 selects the stochastic excitation signal (es) or impulsive excitation signal (ep) according to a one- 
bit selection index Iw. and outputs the selected excitation signal as a constant excitation signal (ec) to the conversion 
35 filter 109. The coefficients employed in this conversion filter 109 are derived from the adaptive index (la), which is received 
from the optimizing circuit 50, and the dequantized linear predictive coefficients (aq), which are received from the quan- 
tizer-dequantizer 103. The filtering operation converts the constant excitation signal (ec) to a varied excitation signal 
(ev). which is output to the second multiplier 111. 

The multipliers 1 10 and 1 1 1 multiply their respective inputs, and furnish the resulting gain-controlled excitation sig- 
40 nals to the adder 112, which adds tiiem to produce tiie final excitation signal (e) furnished to tiie optimizing circuit 50. 
When an optimum excitation signal (eo) has been determined, this signal is also supplied to the adaptive codebook 1 05 
and added to the past history stored therein. 

The optimizing circuit 50 consists of a synthesis filter 1 14, a perceptual distance calculator 115. and a codebook 
searcher 116. 

45 The syrrthesis filter 114 convolves each excitation signal (e) with tiie dequantized linear predictive coefficients (aq) 
to produce the locally synthesized speech signal Sw. The dequantized linear predictive coefficients (aq) are updated 
once per frame. 

The perceptual cfistance calculator 115 computes a sum of the squares of weighted differences between the sample 
values of the input speech signal S and tiie con-esponding sample values of tiie locally syntiiesized speech signal Sw. 
so The weighting is accomplished by passing the differences through a filter tiiat reflects the sensitivity of the human ear 
to different frequencies. The sum of squares (ew) tfius represents the perceptual distance between the input and syn- 
thesized speech signals S and Sw. 

The codebook searcher 116 searches in the codet)ooks 105. 106, 107. and 108 for the combination of excitation 
waveforms and gain values that minimizes the perceptual distance (ew). This combination generates the above-men- 
55. tioned optimum excitation signal (eo). 

The interface circuit 60 formats the power information lo and coefficient Information Ic pertaining to each frame of 
the input speech signal S. and the index information pertaining to the optimum excitation signal (eo) in each subframe. 
... for storage in the IC memory 20 as the coded speech signal M. The index information includes the adaptive, gain, and 
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selection indexes ia. Ig. and iw, and either the stochastic index is or pulse index Ip. depending on the value of the 
selection index Iw. The stored stochastic or pulse index Is or Ip will also be referred to as the constant index. 

Although not explicitly indicated in the drawing, the interface circuit 60 is coupled to the quantizer-dequantizer 102, 
power quantizer 104, and codetx>oK searcher 1 16. 
. 5 Detailed descriptions of the circuit configurations of the above elements will be omitted. All of them can be con- 

structed from well-known computational and memory circuits. The entire coder, including the IC memory 20, can be built 
using a small nunnber of integrated circuits (ICs). 

Next the operation of the coder in FIG. 1 will be described. Procedures for performing linear predictive analysis, 
calculating LSP coefficients, calculating power, and calculating perceptual distance are well known, so the description 

10 will focus on the generation of the excitation signal and the codebook search procedure. 

The described search will be carried out by taking one codebook at a time, in the following sequence: adaptive 
codebook 105, stochastic codebook 106, pulse codebook 107, then gain codebook 108. The invention is not limited, 
however, to this search sequence; any search procedure that yields an optimum excitation signal can be used. 

To find the optimum adaptive excitation signal, the codebook searcher 1 1 6 sends the stochastic codebook 106 and 

75 pulse codebook 107 arbitrary index values, and sends the gain codebook 108 a gain index causing it to output, for 
example, a first gain value (b) of P and a second gain value (g) of zero. Under these conditions, the codebook searcher 
116 sends the adaptive codebook 105 all of the adaptive indexes la in sequence, causing the adaptive codebook 105 
to output all of its candidate waveforms as adaptive excitation signals (ea), one after another. The resulting excitation 
signals (e) are identical to these adaptive excitation signals (ea) scaled by the dequantized power value R 

20 The synthesis filter 40 convolves each of these excitation signals (e) with the dequantized linear predictive coeffi- 
cients (aq). The perceptual distance calculator 1 15 computes the perceptual distance (ew) between each resulting syn- 
thesized speech signal Sw and the current subframe of the input speech signal S. The codebook searcher 116 selects 
the adaptive Index la that yields the minimum perceptual distance (ew). If the minimum perceptual distance is produced 
by two or more adaptive indexes la. one of these indexes (the least index, for example), is selected. The selected adaptive 

25 index la will be referred to as the optimum adaptive index. 

Next, the optimum stochastic excitation signal is found by a similar search of the stochastic codebook 106. The 
codebook searcher 1 16 sends the optimum adaptive index la to the adaptive codebook 1 05 and conversion filter 1 09. 
sends a selection index Iw to the selector 1 1 3 causing it to select the stochastic excitation signal (es). and sends a gain 
index Ig to the gain codebook 1 08 causing it to output, for example, a first gain value (b) of zero and a second gain value 

30 (g) of P. The codebook searcher 116 then outputs all of the stochastic index values Is in sequence, causing the stochastic 
codebook 106 to output all of its stored waveforms, and selects the waveform that yields the synthesized speech signal 
Sw with the least perceptual distance (ew) from the input speech signal S. 

During this search of the stochastic codetx>ok 106. the conversion filter 1 09 filters each stochastic excitation signal 
(es). The filtering operation can be described in terms of its transfer function H(z). which is the z-transform of the impulse 

35 response of the conversion filter. One preferred transfer function is the following: 
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In this equation, p is the number of dequantized linear predictive coefficients (aq) generated by the analysis and 

so quantization circuit 30. The j-th coefficient is denoted aqj Q = 1 p). L is the pitch lag corresponding to the optimum 

adaptive index, A and B are constants such that 0 < A < B < 1 . and e is a constant such that 0 < c s 1 . 

The coefficients aqj contain information about the short-term behavior of the input speech signal S. The pitch lag L 
describes its longer-term periodicity. The result of the filtering operation is to convert the stochastic excitation signal (es) 
to a varied excitation signal (ev) with frequency characteristics more closely resembling the frequency characteristics 
55 of the input speech signal S. The excitation signal (e) is tiie varied excitation signal (ev) scaled by the dequantized power 
value P. 

A search is next made for the optimum impulsive excitation signal (ep). The same procedure is followed as in the 
search for the optimum stochastic excitation signal, except that the codebook searcher 116 now outputs a selection 
index Iw causing the selector 113 to select the impulsive excitation signal (ep), and sends tiie pulse codebook 107 all 
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of the pulse indexes Ip. The conversion filter 109 filters the impulsive excitation signals (ep) in the same way that the 
stochastic excitation signals (es) were filtered. 

If a conversion filter with a transfer function like the above H(2) is employed, the varied excitation signal (ev) contains 
pulse clusters that start at a position determined by the pulse index Ip. have a shape determined by the dequantized 
linear predictive coefficients (aq). repeat periodically at inten^als equal to the pitch lag L determined by the adaptive 
index la, and decay a rate determined by the constant £ , Compared with the impulsive excitation signal (ep). or with a 
conventional pitch-pulse excitation signal, this varied excitation signal (ev) also has frequency characteristics that more 
closely resemble those of the input speech signed S. 

After finding the optimum impulsive excitation signal (ep). the codebook searcher 116 compares the perceptual 
distances (ew) calculated for the optimum impulsive and optimum stochastic excitation signals (es and ep), and selects 
the optimum signal (es or ep) that gives the least perceptual distance (ew) as the optimum constant excitation signal 
(ec). The corresponding selection index Iw becomes the optimum selection index. 

Next, a search is made for the optimum gain index. The codebook searcher 116 outputs the optimum adaptive index 
(la) and optimum selection index (Iw). and either the optimum stochastic index (Is) or the optimum pulse index (Ip). 
depending on which signal is selected by the optimum selection index (Iw). All values of the gain index Ig are then 
produced in sequence, causing the gain codebook 108 to output all stored pairs of gain values. These pairs of gain 
values represent different mixtures of the adaptive and varied excitation signals (ea and ev). These gain values can also 
adjust the total power of the excitation signal. As before, the codebook searcher 1 16 selects, as the optimum gain index, 
the gain index that minimizes the perceptual distance (ew) from the input speech signal S. 

When the optimum adaptive excitation signal, optimum constant excitation signal, and optimum pair of gain values 
have been found as described above, the codebook searcher 1 1 6 furnishes the indexes la. Iw, Is or Ip, and Ig that select 
these signals and values to the interface circuit 60, to be written in the IC memory 20. In addition, these optimum indexes 
are supplied to the excitation circuit 40 to generate the optimum excitation signal (eo) once more, and this optimum 
excitation signal (eo) is routed from the adder 1 1 2 to the adaptive codelx>ok 1 05. where it becomes the new most-recent 
segment of the stored history. The oldest one-subframe portion of the history stored in the adaptive codebook 105 is 
deleted to make room for this new segment (eo). After the adaptive codebook 105 has been updated in this way, the 
search for an optimum excitation signal in the next subframe begins. 

Rrst decoder embodiment 

FIG. 2 shows a first embodiment of the invented CELP decoder. The decoder generates a reproduced speech signal 
Sp from the coded speech signal M stored in the IC memory 20 by the coder in FIG. 1. The decoder comprises the 
following main functional drcuit blocks: an interface circuit 70, a dequantization circuit 80. an excitation circuit 40. and 
a filtering circuit 90. 

The interface circuit 70 reads tiie coded speech signal M from the IC memory 20 to obtain power, coefficient and 
index information. Power information lo and coefficient information Ic are read once per frame. Index information (la, Iw. 
Is or Ip. and Ig) is read once per subframe. The index information includes a constant index that is interpreted as either 
a stochastic index (Is) or pulse index (Ip). depending on the value of the selection index (Iw). 

The dequantlzing circuit 80 comprises a coefficient dequantizer 117 and power dequantizer 118. The coefficient 
dequantizer 1 1 7 dequantizes the coefficient information Ic to obtain LSP coefficients, which it then converts to dequan- 
tized linear predictive coefficients (aq) as in the coder. The power dequantizer 118 dequantizes the power information 
lo to (^tain the dequantized power value P. 

The excitation circuit 40 is identical to tfie excitation circuit 40 in the coder in FIG. 1 . The same reference numerals 
are used for this circuit in both drawings. 

The filtering circuit 90 conprises a synthesis filter 1 1 4 identical to tiie one in FIG. 1 , and a post-filter 1 1 9. The post- 
filter 119 filters the synthesized speech signal Sw. using information obtained from the dequantized linear predictive 
coefficients (aq) supplied by the coefficient dequantizer 1 17. to compensate for frequency characteristics of the human 
auditory sense, tiiereby generating the reproduced speech signal Sp. A detailed description of this filtering operation 
will be omitted, as post-filtering is well known in the art. 

The operation of the first decoder embodiment can be understood from the above description and the description 
of tfie first coder embodiment. The interface circuit 70 supplies the dequantizing drcurt 80 with coefficient and power 
information Ic and lo once per frame, and the excitation circuit 40 with index information once per subframe. The excitation 
circuit produces the optimum excitatfon signals (e) that were selected in the coder The synthesis filter 1 14 fflters these 
excitation signals, using the same dequantized linear predictive coefficients (aq) as in tiie coder, to produce the same 
syntiiesized speech signal Sw. which is modified by the post-filter 21 4 to obtain a more natural reproduced speech signal 
Sp. 

from a coded speech signal recorded at a bit rate on tiie order of four thousand bits per second (4 kbits/s), the • 
coder and decoder of tiiis first embodiment can generate a reproduced speech signal Sp of noticeably improved quality. 
A bit rate of 4 M^its^ allows over an hour's worth of messages to be recorded in sixteen megabits of memory space, an 



6 



I ft 



EP 0 714 089 A2 

amount now available in a single IC. A telephone set Incorporating the first embodiment can accordingly add answering- 
machine functions with very little Inaease in size or weight. 

One reason for the improved speech quality at such low bit rates is that the coefficient information Ic is coded by 
vector quantization of LSP coefficients. At low bit rates, relatively few bits are available for coding the coefficient infor- 

5 matton. so there is inevitably some distortion of the frequency spectrum of the vocal-tract model that the coefficients 
represent, due to quantization error. With LSP coefficients, a given amount of quantization error is known to produce 
less distortion than would be produced by the same amount of quantization error with linear predictive coefficients, 
because of the superior interpolation properties of LSP coefficients. LSP coefficients are also known to be well suited 
for efficient vector quantization. 

10 A second reason for the Improved speech quality is the provision of the pulse codebook 206. which is rK>t found In 
conventional CELP systems. These conventional systems depend on the recycling of stochastic excitation signals 
through the adaptive codebook to produce periodic excitation waveforms, but at very low bit rates, the selection of signals 
Is not adequate to produce excitation waveforms of a strongly impulsive character. The most strongly periodic waveforms, 
which occur at the onset and sometimes in the plateau regions of voiced sounds, have this impulsive character. By 

15 adding a codebook 206 of impulsive waveforms, the present invention makes possible more teithful reproduction of the 
most strongly impulsive and most strongly periodic speech waveforms. 

A third reason for the improved speech quality is the conversion filter 1 09. It has been experimentally shown that 
the frequency characteristics of the waveforms that excite the human vocal tract resemble the complex frequency char- 
acteristics of the sounds that emerge from the speaker's mouth, and differ from the oversimplified characteristics of pure 

so white noise or pure impulses. Filtering the stochastic and impulsive excitation signals (es and ep) to make their frequency 
characteristics more closely resemble those of the input speech signal S brings the excitation signal into better accord 
with reality, resulting in more natural reproduced speech. This improvement is moreover achieved with no inaease in 
the bit rate, because the conversion filter 109 uses only Information (la and aq) already present In the coded speech 
signal. 

25 A further benefit of the frequency converter 1 09 Is that emphasizing frequency components actually present in the 
input speech signal helps mask spurious frequency components produced by quantization error. 

The combination of the pulse codebook 107 and conversion filter 109 provides an excitation signal that varies in 
shape, periodicity, and phase. This excitation signal is far superior to the pitch pulse found in conventional LPC vocoders, 
which varies only In periodicity. It is also produced more efficiently than would be possible with conventional CELP 
30 coding, which would require each of these excitation signals to be stored as a separate stochastic waveform. 

The capability to switch between stochastic and impulsive excitation signals also improves the reproduction of tran- 
sient portions of the speech signal. The overall perceived effect of the combined addition of the pulse codebook 107. 
conversfon filter 109, and selector 1 13 is that speech is reproduced more clearly and naturally. 

The impulse waveforms in the pulse codebook 107 could, incidentally, be produced by an Impulse signal generator. 
35 Use of a pulse codebook 1 07 is preferred, however, because that simplifies synchronization of the Impulsive and adaptive 
excitation signals, and enables the stochastic and pulse indexes is and Ip to be processed In a similar manner. 

Second coder emtxjdiment 

40 FIG. 3 shows a second embodiment of the invented CELP coder, using the same reference numerals as in FIG. 1 
to designate identical or equivalent parts. This coder enatDles messages to be recorded in a normal voice or monotone 
voice, at the user's option. The second coder emtxxliment is intended for use with the first decoder embodiment, shown 
in FIG. 2. 

Monotone recording is useful in a telephone answering machine as a countermeasure to nuisance calls, applicable 
45 to both incoming and outgoing messages. For incoming messages, if certain types of nuisance calls are recorded in a 
monotone, they sound less offensive when played back. For outgoing messages, if the nuisance caller is greeted in a 
robot-like, monotone voice, he is likely to be discouraged and hang upi A further advantage of the monotone feature is 
that the telephone user can record an outgoing message without revealing his or her identity. 

Referring to FIG. 3, the coder of the second embodiment adds an index converter 120 to the coder structure of the 
so first emt)odiment The index converter 120 receives a monotone control signal (coni) from the device that controls the 
telephone set. and the index (la) of the optimum adaptive excitation signal from the codebook searcher 116. When the 
monotone control signal (coni) is inactive, the index converter 1 20 passes the optimum adaptive index (la) to the Interface 
circuit 60 without alteration. When the monotone control signal (coni) Is active, the index converter 120 replaces the 
optimum adaptive index (la) with a fixed irwJex (lac), unrelated to the optimum index (la), and furnishes the fixed index 
55 ( lac) to the interface circuit 60 . The monotone control signal (con 1 ) Is activated or deactivated in response*t6.' for example, 
the press of a pushbutton on the telephone set 

As explained In the first embodiment, the adaptive index specifies the pitch lag. Supplied to both the adaptive code- 
book 105 and conversion filter 109, this index is the main determinant of the periodicity of the excitation signal, hence 
of the pitch of the synthesized speech signal. If a fixed adaptive Index (lac) Is supplied to the adaptive codebook 105 
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and conversion filter 109 in place of the optimum index (la), the resulting excitation signal (e) will have a substantially 
unchanging pitch, and the synthesized speech signal (Sw) will have a flat, genderless. robot-like quality. 
Other operations and effects of the second coder embodiment are the same as in the first embodiment. 

Second decoder embodiment 

FIG, 4 shows a second embodiment of the invented CELP decoder, using the same reference numerals as in FIG. 
2 to designate identical or equivalent parts. This decoder is intended for use with the first coder embodiment, shown in 
FIG. 1, to enable optional playback of the recorded speech signal in a monotone voice. 

As can be seen from FIGs. 4 and 2. the second embodiment adds an index converter 1 22 to the decoder structure 
of the first embodiment, between the interface circuit 70 and excitation circuit 40. The index converter 122 receives a 
monotone control signal (coni) from the device that controls the telephone set, and the optimum adaptive index (la) 
from the interface circuit 70. When the monotone control signal (coni) is inactive, the optimum adaptive index (la) is 
passed to the adaptive codebook 105 and conversion filter 109 without alteration. When the monotone control signal 
(com) is active, the index converter 122 replaces the optimum adaptive index (la) with a fixed index (lac), unrelated to 
the optimum adaptive index (la), and supplies this fixed index (lac) to the adaptive codebook 1 05 and conversion filter 1 09. 

As in the second coder embodiment, when the monotone control signal (coni) is active, the. excitation signal (e) 
has a generally unchanging pitch, and the reproduced speech signal (Sp) is substantially a monotone. For outgoing 
messages, the decoder in FIG. 4 provides the same advantages as the coder in FIG. 3. For incoming messages, the 
decoder in FIG. 4 provkJes the ability to decide, on a message-by-message basis, whether to play the message back 
in its natural voice or a monotone voice. Nuisance calls can then be played back in the inoffensive monotone, while other 
calls are played back normally. 

Other operations and effects of the second decoder embodiment are the same as in the first embodiment. 

Third coder embodiment 



FIG. 5 shows a third embodiment of the invented CELP coder, using the same reference numerals as in FIG 1 to 
designate identical or equivalent parts. The third coder embodiment permits the speed of the speech signal to be con- 
verted when the signal is coded and recorded, without altering the pitch. This coder is intended for use with the first 
decoder embodiment, shown in FIG. 2. 

As can be seen from FIGs. 5 and 1. the third coder embodiment adds a speed controller 124 comprising a buffer 
memory 126. a periodicity analyzer 128. and a length adjuster 130 to the coder structure of the first embodiment. The 
speed controller 1 24 is disposed in the input stage of the coder, to convert the input speech signal S to a modified speech 
signal Sm. The modified speech signal Sm is supplied to the analysis and quantization circuit 30 and optimizing circuit 
50 in place of the original speech signal S. and is coded in the same way as the input speech signal S was coded in the 
first embodiment. 

The speed controller 124 receives a speed control signal (con2) that designates a speed factor (sf) When the 
designated speed factor is unity (sf = 1 ). the speed controller 124 does nothing, and the modified speech signal Sm is 
Identical to the input speech signal S. When the speed factor is less than unity (sf < 1). designating a speaking speed 
faster than normal, the speed controller 124 deletes samples from the input speech signal S to produce the modified 
speech signal Sm. When the speed factor is greater than unity (sf > 1). designating a speed slower than normal, the 
speed controller 124 inserts extra sanples into the input speech signal S to produce the modified speech signal Sm. 

The speed control signal (con2) is produced in response to. for example, the push of a button on a tel^hone set. 
The telephone may have buttons marked fast, normal, and slow, or the digit keys on a pushbutton telephone can be 
used to select a speed on a scale from, for example, one (very slow) to nine (very fast). 

In the speed controller 124. the buffer memory 126 stores at least two frames of the input speech signal 8. The 
penodicity analyzer 1 28 analyzes the periodicity of each frame, determines the principal periodicity present in the frame, 
and outputs a cycle count (cc) indicating the number of samples per cycle of this periodicity. 

The lengtii adjuster 130 calculates the difference (di) between the fixed number of samples per frame (nf) and this 
number multipfied by the speed factor (nf x sf). then finds the number of whole cycles that is closest to this difference, 
"mat is. the length adjuster 130 finds an integer (n) such that n x cc is ctase as possible to the calcufated difference (di)! 
Conceptually, the difference (di) is divided by the cycle count (cc) and the result is rounded off to the nearest integer (n). 

If this integer (n) is not zero, the length adjuster 1 30 proceeds to delete or interpolate samples. Samples are deleted 
or interpolated in blocks, the block length being equal to the cycle count (cc). so that each deleted or interpolated block 
represents one whole cyde of the periodicity found by the periodicity analyzer 1 28. . 

FIG. 6 illustrates deletion when the frame length (nf) is three hundred twenty samples, the speed factor (sf) is two- 
thirds, and the cycle count (cc) is fifty. One frame of the input speech signal S, comprising three hundred twenty (nf) 
samples, is shown at the top, divided into cycles of fifty samples each. The frame contains six such cycles, numbered 
from (1 ) to (6). plus a few remaining samples. 
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The difference value (di) in this example is slightly more than one hundred samples, so the closest number of whole 
cycles is two (n = 2). The length adjuster 1 30 accordingly deletes two whole cycles. The simplest way to select the cycles 
to be deleted is to delete the initial cycles, in this case the first two cycles (1) and (2). as illustrated. The modified speech 
signal Sm accordingly contains only the last two hundred twenty samples from this frame 

5 [ nf - (n X cc) = 320 - (2 X 50) = 220 ]. 

After similarly deleting cycles from the next frame, the length adjuster 1 30 reframes the modified speech signal Sm 
so that each frame again consists of three hundred twenty samples. The above two hundred twenty samples, for example, 
can be combined with the first one hundred non-deleted samples of the next frame, indicated by the numbers (9) and 
(10) in the drawing, to make one complete frame of the modified speech signal Sm. 

10 FIG. 7 illustrates interpolation when the frame length (nf) is three hundred twenty samples, the speed factor (sf) is 
1.5, and the cycle count (cc) is eighty. One frame now consists of four cycles, numbered (1) to (4). The difference (di) 
is one hundred sixty samples, or exactly two cycles (n = 2). The length adjuster 130 interpolates two whole cycles by, 
for example, repeating each of the first two cycles (1) and (2) in the modified speech signal Sm, as shown. The input 
frame is thereby expanded to four hundred twenty samples [ nf + (n x cc) ]. After interpolation, the modified speech signal 

15 Sm is reframed into frames of three hundred twenty samples each. 

Operation of the other parts of the coder In FIG. 5 is the same as in the first embodiment, so a description will be 
omitted. 

By deleting or interpolating whole cycles, the speed controller 124 can slow down or speed up the speech signal 
without altering its pitdi, and with a minimum of disturbance to the periodic structure of the speech waveform. The 
20 modified speech signal Sm accordingly sounds like a person speaking in a normal voice, but speaking rapidly (if sf < 1) 
or slowly (if sf > 1). 

One effect of speeding up the speech signal in the coder is to permit more messages to be recorded in the IC 
memory 20. If the speed factor (sf) is two-thirds, for example, the recording time is extended by fifty per cent. A person 
who expects many calls can use this feature to avoid overf bw of the IC memory 20 in his telephone answering machine. 
25 Another effect of speeding up the speech signal is. of course, that it shortens the playback time. 

An effect of slowing down the speech signal is that recorded messages become easier to understand when played 
back. 

Either speeding up or slowing down the outgoing greeting message recorded in a telephone answering machine is 
a possible deterrent to nuisance calls. 

30 

Third decoder embodiment 

FIG. 8 shows a third embodiment of the invented decoder, using the same reference numerals as in FIG. 2 to 
designate identical or equivalent parts. The decoder of the third embodiment permits the speed of the speech signal to 

35 altered when the signal is decoded and played back, without altering the pitch. This decoder is intended for use with the 
coder of the first embodiment, shown in FIG. 1 . 

As can be seen from FIGs. 8 and 2, the third embodiment adds a speed controller 132 to the decoder structure of 
the first embodiment The speed controller 132 is disposed between the excitation circuit 40 and filtering circuit 90, and 
operates on the excitation signal (e) to produce a modified excitation signal (em). The speed controller 132 is similar to 

40 the speed controller 124 in the coder of the third embodiment, comprising a buffer memory 134, a periodicity analyzer 
1 36. and a length adjuster 1 38, which operate similarly to the corresponding elements 1 26. 1 28. and 130 in FIG. 5. The 
speed control signal (con2) designates a speed factor (sf), as in the third coder embodiment. 

The buffer memory 134 stores the optimum excitation signals (e) output by the adder 112 over a certain segment 
with a length of at least one frame. The periodicity analyzer 1 36 finds the principal frequency component of the excitation 

45 signal (e) during, for exanple. one frame, and outputs a corresponding cycle count (cc), as described above. The length 
adjuster 138 deletes or interpolates a number of samples equal to an integer multiple (n) of the cycle count (cc) in the 
excitation signal (e), the samples being deleted or interpolated in blocks with a block length equal to the cycle count 
(cc). The multiple (n) is determined by the speed factor (sf) specified by the speed control signal (con2), as in the third 
coder embodiment. 

so After deleting or interpolating samples, the length adjuster 1 38 calculates the resulting frame length (si) of the mod- 
ified excitation signal (em), i.e.. the number of samples in one modified frame, and furnishes the number (si) to the 
interface circuit 70, dequantizing circuit 80. and filtering circuit 90. This number (si) controls the rate at which the coded 
speech signal M is read out of the IC memory 20, the intervals at which new dequantized power values P are furnished 
to the excitation circuit 40, and the intervals at which the linear predictive coefficients (aq) are updated. Instead of refram- 

55 ing the excitation signal to a standard length, the length adjuster 138 instructs the other parts of the decoder to operate 
in synchronization with the variable frame length of the modified excitation signal (em). 

Aside from using a variable frame length (si), the other parts of the decoder operate as in the first embodiment, so 
further description will be omitted. 
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By shortening or lengthening the excitation signal as described above, the decoder in FIG. 8 can speed up or slow 
down the reproduced speech signal Sp without altering its pitch. The shortening or lengthening is accomplished with 
minimum disturbance to the periodic structure of the excitation signal, because samples are deleted or interpolated in 
whole cycles. Any disturbances that do occur are moreover reduced by filtering in the filtering drcuit 90. so the reproduced 
speech signal Sp is relatively free of artifacts, apart from the change in speed. For this reason, deleting or Interpolating 
samples in the excitation signal (e) is preferable to deleting or interpolating samples in the reproduced speech signal (Sp). 

The third decoder embodiment provides effects already described under the third coder embodiment: in a telephone 
answering machine, recorded incoming messages can be speeded up to shorten the playback time, or slowed down if 
they are difficult to understand, and recorded outgoing messages can be reproduced at an altered speed to deter nui- 
sance calls. One capability afforded by the third decoder embodiment Is the capability to scan through a large number 
of messages at high speed (sf < 1) to find a particular message, which is then played back at normal speed (sf = 1). 
Another is the capability to play back desired calls at normal speed, and undesired or nuisance calls at a faster speed. 

Fourth decoder embodiment 

FIG. 9 shows a fourth embodiment of the invented CELP decoder, using the same reference numerals as in FIG. 2 
to designate identical or equivalent parts. This fourth decoder embodiment is intended for use with the first coder embod- 
iment shown in FIG. 1 . The fourth decoder embodiment is adapted to mask pink noise in the reproduced speech signal. 

Although the first embodiment reduces and masks distortion and quantization noise to a considerable extent, these 
effects cannot be eliminated completely: at very low bit rates the reproduced speech signal always has an audible coding- 
noise component. It has been experimentally found that the coding noise tends not to be of the relatively innocuous 
white type, which has a generally flat frequency spectrum, but of the more in-itating pink type, which has conspicuous 
frequency characteristics. 

A similar effect of low bit rates is that natural background noise present In the original speech signal is modulated 
by the coding and decoding process so that it takes on the character of pink noise. 

Strictly speaking, pink noise is defined as having increasing intensity at decreasing frequencies. The term will be 
used herein, however, to denote any type of noise with a noticeable frequency pattern. Pink noise is perceived as an 
audible hum, whine, or other annoying effect. 

As can be seen from FIGs. 8 and 2. the fourth decoder embodiment adds a white-noise generator 140 and adder 
1 42 to the structure of the first decoder embodiment. The white-noise generator 1 40 generates a white-noise signal (nz) 
with a power responsive to the dequantized power value R Methods of generating such noise signals are well known in 
the art. The adder 141 adds this white-noise signal (nz) to the speech signal output from the post-filter 214 to create the 
final reproduced speech signal Sp. 

Aside from this final addition of a white-noise signal (nz), tiie fourth decoder embodiment operates like the first 
decoder embodiment. The white-noise signal (nz) masks pink noise present in the output of the post-filter 21 4. making 
the pink noise less obtrusive. The noise component in the final reproduced speech signal Sp therefore sounds more like 
natural background noise, which the human ear readily ignores. 

Modified excitation circuit 

FIG. 1 0 shows a modified excitation circuit, in which the stochastic and pulse codebooks 106 and 107 and selector 
1 13 are combined into a single fixed codebook 150. This fixed codebook 150 contains a certain nuirtjer of stochastic 
waveforms 1 52 and a certain number of imputeive waveforms 1 54, and is indexed by a combined index Ik. The combined 
index Ik replaces the stochastic index Is, pulse Index Ip. and selection index Iw in the preceding embodiments. 

As in the preceding embodiments, the stochastic waveforms represent white noise, and the impulsive waveforms 
consist of a single impulse each. The fixed codebook 150 outputs the waveform indicated by the constant index Ik as 
the constant excitation signal ec. 

The other elements in FIG. 10 are identical to the elements with the same reference numerals in the preceding 
embodiments. FIG. 1 0 has been drawn to show more clearly the structure of the gain codebook 1 08, which stores paiis 
of gain values b^ and g|^ (k = 1, 2, ...). 

FIG. 1 0 also shows the staicture of the adaptive codebook 105. The final or optimum excitation signal (e) is shifted 
into the adaptive codebook 1 05 from the right end in the drawing, so that older samples are stored to the left of newer 
samples. When a segment 156 of the stored waveform is output as an adaptive excitation signal (ea). it is output from 
left to right. The pitch lag L tiiat identifies the beginning of the segment 1 56 Is calculated by. for example, adcGng a certain 
constant C to the adaptive Index la. this constant C representing the minimum pitch lag. 

The excitation circuit in FIG. 10 operates substantially as described in the first embodiment, and provides similar 
effects. The codebook searcher 1 16 searches the single fixed codebook 150 instead of making separate searches of 
the stochastic and pulse codebooks 106 and 107 and then choosing between them, but the end result is the same. ' 
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The excitation circuit in FIG. 1 0 can replace the excitation drcuit 40 in any of the preceding embodiments. An advan- 
tage of the circuit in FIG. 10 is that the numbers of stochastic and impulsive waveforms stored in the fixed codebook 
150 need not be the same. 

5 Other variations 

The invention is not llnvted to the embodiments and modification described above, but has many possible variations, 
some of which are described tjelow. 

In the embodiments above, the codebook searcher 116 was described as making a sequential search of each 
10 codebook, but the coder can be designed to process two or more excitation signals in parallel, to speed up the search 

process. 

The first gain value need not be zero during the searches of the stochastic and pulse codebooks, or of the constant 
codebook. A non-zero first gain value can be output. 

Although the coder and decoder have been shown as if they were separate circuits, they have many circuit elements 
75 in common. In a device such as a telephone answering machine having both a coder and decoder, the common circuit 
elements can of course be shared. 

Although preferably practiced with specially-designed integrated circuits, the invention can also be practiced by 
providing a general-purpose computing device, such as a microprocessor or digital signal processor (DSP), with pro- 
grams to execute the functions of the circuit blocks shown in the drawings. 
20 The embodiments above showed forward linear predictive coding, in which the coder calculates the linear predictive 
coefficients directly from the input speech signal S. The invention can also be practiced, however, with backward linear 
predictive coding, in which the linear predictive coefficients of the input speech signal S are computed, not from the input 
speech signal S itself, but from the locally reproduced speech signal Sw. 

The adaptive codebook 1 05 was described as being of the shift type, that stores the most recent N samples of the 
25 optimum excitation signal, but the invention is not limited to this adaptive codebook structure. 

Although the first embodiment prescribes an adaptive codebook. a stochastic codebook. a pulse codebook. and a 
gain codebook. the novel features of second, third, and fourth embodiments can be added to CELP coders and decoders 
with other codebook configurations, including the conventional configuration with only an adaptive codebook and a 
stochastic codebook. in order to reproduce speech in a monotone voice, or at an altered speed, or to mask pink noise. 
30 The speed controllers in the third embodiment are not restricted to deleting or repeating the initial cycles in a frame 
as shown in FIGs. 6 and 7. Other methods of selecting the cycles to be deleted or repeated can be employed. The the 
unit within which deletion and repetition are carried out need not be one frame; other units can be used. 

The white-noise signal (nz) generated in the fourth embodiment need not be responsive to the dequantized power 
value P A white-noise signal with fixed variations, unrelated to P, could be used instead. A noise signal (nz) of this type 
35 can be stored in advance and read out repeatedly, in which case the noise generator 1 40 requires only means for storing 
and reading a fixed waveform. 

The second, third, and fourth embodiments can be combined, or any two of them can be combined. 

Although the Invention has been described as being used in a telephone answering machine, this is not its only 
possible application. The invention can be employed to store messages in electronic voice mail systems, for example. 
40 It can also be employed for wireless or wireline transmission of digitized speech signals at low bit rates. 

Those skilled in the art will recognize that other variations are also possitsle without departing from the scope claimed 
below. 

Claims 

45 

1. A code-excited linear predictive coder for coding an input speech signal, comprising: 

a power quantizer (1 04) for calculating a power value of said input speech signal, quantizing said power value 
to obtain power information, and dequantizing saki power information to obtain a dequantized power value; 

a linear predictive analyzer (101) for calculating linear predictive coefficients of said input speech signal; 
so a quantizer-dequantizer (1 02) coupled to said linear predictive analyzer (101), for converting said linear pre- 

dictive coefficients to line-spectrum-pair coefficients, quantizing said line-spectrum-pair coefficients to obtain coef- 
ficient information, then dequantizing said coefficient information to obtain dequantized line-spectrum-pair 
coefficients and converting said dequantized line-spectrum-pair coefficients back to linear predictive coefficients, 
thereby obtaining dequantized linear predictive coefficients; 
55 an adaptive codebook (105) for storing a plurality of candidate waveforms, modifying said candidate wave- 

forms responsive to an optimum excitation signal, and outputting one of sakI candidate waveforms, responsive to 
an adaptive Index, as an adaptive excitation signal; 

a stochastic codebook (1 06) for storing a plurality of white-noise waveforms, and outputting one of said white- 
noise waveforms, responsive to a stochastic index, as a stochastic excitation signal; 
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a pulse codebook (107) for storing a plurality of impulsive waveforms, and outputting one of said impulsive 
waveforms, responsive to a pulse index, as an impulsive excitation signal; 

a selector (113) coupled to said stochastic codetKX)k (106) and said pulse codebook (107). for selecting a 
constant excitation signal by choosing between said stochastic excitation signal and said impulsive excitation signal, 
responsive to a selection index; 

a conversion f flter (1 09) coupled to said selector (113). fa- filtering said constant excitation signal, responsive 
to said adaptive index and said dequantized linear predictive coefficients, to produce a varied excitation signal more 
closely resembling said input speech signal in frequency characteristics; 

a gain codebook (108) coupled to said power quantizer (104), for storing a plurality of pairs of gain values, 
outputting one of said pairs responsive to a gain index, and scaling sakJ one of said pairs responsive to said dequan- 
tized power value, thereby producing a first gain value and a second gain value; 

a first multiplier (1 10) coupled to said gain codebook (108) and said conversion filter (109). for multiplying 
said adaptive excitation signal by said first gain value to produce a first gain-controlled excitation signal; 

a second multiplier (1 1 1) coupled to said gain codebook (108) and said adaptive codebook (105). for multi- 
plying said varied excitation signal by saki second gain value to produce a second gain-oontrolled excitation signal; 

an adder (112) coupled to said first multiplier (110) and said second multiplier (1 1 1). for adding said first gain- 
controlled exdtation signal and said second gain-controlled excitation signal to produce a final excitation signal; 

an optimizing circuit (50) coupled to said quantizer-dequantizer (102) and said adder (1 12), for generating a 
synthesized speech signal from said final excitation signal and said dequantized linear predictive coefficients, com- 
paring sak) synthesized speech signal with said input speech signal, and determining optimum values of sakJ adap- 
tive index, said stochastic index, said pulse index, said selection index, and said gain index, said optimum excitation 
signal being produced as said final excitation signal in response to said optimum values; and 

an interface circuit (60) coupled to said optimizing circuit (50). for combining said optimum values, sakJ power 
information, and said coefficient information to generate a coded speech signal. 

The coder of daim 1 . wherein the candidate waveforms stored in said adaptive codebook (105) are past segments 
6f said optimum excitation signal, starting at points designated by said adaptive index. 

The coder of claim 1 , wherein each of the impulsive waveforms stored in said pulse codebook (107) consists of a 
single isolated impulse, disposed at a position designated by said pulse index. 

The coder of claim 3 wherein, when said selector (113) selects said impulsive excitation signal, said conversion 
filter (109) produces a varied exdtation signal consisting of pulse dusters with a shape responsive to sakf dequan- 
tized linear predictive coefficients, repeated at intervals determined by said adaptive index, starting from a position 
determined by said pulse index. 

The coder of claim 1 , vidierein said stochastic codebook 106, said pulse codebook 107, and said selector 1 13 are 
replaced by a single fixed codebook 150 storing both said white-noise waveforms and said impulsive waveforms, 
and said stochastic index, said pulse index, and sac! selection index are replaced by a single combined index. 

The coder of claim 1 . further comprising an index converter (1 20) for supplying saki interface circuit (60) with a fixed 
adaptive Index for inclusion In said coded speech signal in place of saki optimum adaptive index, responsive to a 
control signal designating that said coded speech signal should represent speech of monotone pitch. 

The coder of claim 1 , further comprising a speed controller (1 24) for detecting periodicity in said input speech and 
deleting portions of said input speech signal responsive to a speed control signal, the portions deleted by sakJ speed 
controller (124) having lengtiis corresponding to the periodidty detected by saki speed controller (124). 

The coder of claim 7, wherein said speed controller (124) also interpolates new portions into said input speech 
signal responsive to said speed control signal, the portions interpolated by said speed controller (1 24) having lengths 
conresponding to tiie periodidty detected by said speed controller (124). 

A code-exdted linear predictive decoder for decoding a coded speech signal created by the code-excited linear 
predictive coder of claim 1 . comprising: 

an interface circuit (70). for demultiplexing said coded speech signal to obtain coeffident information, power 
information, an adaptive index, a selection irKiex, a constant index, and a gain index; 

a coeffident dequantizer (1 17) coupled to said interface circuit (70). for dequantizing said coeffident infor- 
mation to obtain line-spectrum-pair coefficients, and converting said line-spectrum-pair coeffidents to dequantized 
linear predictive coeffidents; 
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a power dequantizer (118) coupled to said interface circuit (70), for dequantizing said power information to 
obtain a dequantized power value; 

an adaptive codebook (105) for storing a plurality of candidate waveforms, modifying said candidate wave- 
forms responsive to a final excitation signal, and outputting one of said candidate waveforms, responsive to said 
adaptive index, as an adaptive excitation signal; 

a stochastic codebook (1 06) for storing a plurality of white-noise waveforms, and outputting one of said white- 
noise waveforms, responsive to said constant index, as a stochastic excitation signal; 

a pulse codebook (107) for storing a plurality of periodic impulsive waveforms, and outputting one of said 
periodic impulsive waveforms, responsive to said constant index, as an impulsive excitation signal; 

a selector (113) coupled to said stochastic codebook (106) and said pulse codebook (107), for selecting a 
constant excitation signal by choosing between said stochastic excitation signal and said impulsive excitation signal, 
responsive to said selection index; 

a conversion filter (1 09) coupled to said selector (1 1 3). for converting said constant excitation signal, respon- 
sive to said adaptive index and said dequantized linear predictive coefficients, to produce a varied excitation signal 
more closely resembling said speech signal in frequency characteristics; 

a gain codebook (108) coupled to said power dequantizer (1 18), for storing a plurality of pairs of gain values, 
outputting one of said pairs responsive to said gain index, and scaling said one of said pairs responsive to said 
dequantized power value, thereby producing a first gain value and a second gain value; 

a first multiplier (1 10) coupled to said gain codebook (1 08) and said adaptive codebook (1 05), for multiplying 
said adaptive excitation signal by said first gain value to produce a first gain-controlled excitation signal; 

a second multiplier (111) coupled to said gain codebook (1 08) and said conversion filter (1 09), for multiplying 
said varied excitation signal by said second gain value to produce a second gain-controlled excitation signal; 

an first adder (112) coupled to said first multiplier (110) and said second multiplier (1 1 1 ). for adding said first 
gain-controlled excitation signal and said second gain-controlled excitation signal to produce said final excitation 
signal; and 

a filtering circuit (90) coupled to said first adder (112), for creating a reproduced speech signal from said 
dequantized linear predictive coefficients and said final excitation signal. 

1 0. The decoder of claim 9, wherein the candidate waveforms stored in said adaptive codebook (1 05) are past segments 
of said final excitation signal, said adaptive index denoting respective starting points of said segments. 

1 1 . The decoder of claim 9. wherein each of the impulsive waveforms stored in said pulse codebook (1 07) consists of 
a single isolated impulse, said pulse index denoting position of said single isolated impulse. 

12. The decoder of claim 11 wherein, when said selector (11 3) selects said impulsive excitation signal, said conversion 
filter (109) produces a varied excitation signal consisting of pulse clusters with a shape responsive to said dequan- 
tized linear predictive coefficients, repeated at intervals determined by said adaptive index, starting from a position 
determined by said pulse Index. 

13. The decoder of claim 9. wherein said stochastic codebook 106, said pulse codebook 107, and said selector 113 
are replaced by a single fixed codebook 1 50 storing both said white-noise waveforms and said impulsive waveforms, 
and said stochastic index, said pulse index, and said selection index are replaced by a single combined index. 

14- The decoder of claim 9, further comprising an index converter (1 22) for converting the adaptive index demultiplexed 
by said interface circuit (70) to a fixed adaptive index, responsive to a control signal designating that said reproduced 
speech signal should have a monotone pitch. 

15. The decoder of claim 9, further comprising a speed controller (132) for detecting periodicity in said final excitation 
signal and deleting portions of said final excitation signal responsive to a speed control signal, the portions deleted 
by said speed controller (132) having lengths corresponding to the periodicity detected by said speed controller 
(132). 

1 6. The decoder of claim 1 5. wherein said speed controller (1 32) also interpolates new portions into said final excitation 
signal responsive to said speed control signal, the portions interpolated by said speed controller (1 32) having lengths 
corresponding to the periodidty detected by said speed controller (132). 

17. The decoder of claim 9, further comprising: 

a noise generator (140) for generating a white-noise signal; and 
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a second adder (142) for modifying said reproduced speech signal by adding said white-noise signal to said 
reproduced speech signal. 

1 8. An improved code-excited linear predictive coder of the type having an adaptive codebook (1 05) for storing a plurality 
of candidate waveforms, outputting one of said cevididate waveforms in response to an optimum adaptive index, 
and modifying said candidate waveforms in response to an optimum excitation signal, and an interface circuit (60) 
for generating a coded speech signal of which said optimum adaptive index forms one part, the improvement com- 
prising: 

an index converter (120) for supplying said interface circuit (60) witii a fixed adaptive index for inclusion in 
said coded speech signal in place of said optimum adaptive index, responsive to a control signal designating that 
said coded speech signal should represent speech of monotone pitch. 

1 9. The coder of claim 1 8, wherein the candkJate waveforms stored in said adaptive codebook (1 05) are past segments 
of said optimum excitation signal, said adaptive index denoting respective starting points of said segments. 

20. An improved code-excited linear predictive decoder of tiie type having an interfece circuit (70) for obtaining an 
optimum adaptive index and coefficient information from a coded speech signal, an adaptive codebook (105) for 
storing a plurality of candidate waveforms, outputting one of said candidate waveforms in response to said optimum 
adaptive index, and modifying said candidate waveforms in response to an excitation signal derived from said one 
of said candidate waveforms, and a filtering circuit (90) for filtering said excitation signal according to said coefficient 
information to generate a reproduced speech signal, the improvement comprising: 

an index converter (122) for supplying said adaptive codebook (105) witii a fixed adaptive index in place of 
said optimum adaptive index, responsive to a control signal designating tiiat said reproduced speech signal shoufo 
have monotone pitch. 

21. The decoder of claim 20. wherein tiie candidate waveforms stored in said adaptive codebook (105) are past seg- 
ments of said excitation signal, said adaptive index denoting respective starting points of said segments. 

22. An improved code-excited linear predictive coder of tiie type that receives and codes an input speech signal, tiie 
improvement comprising: 

a speed controller (124) for detecting periodicity in said input speech signal and deleting portions of said 
input speech signal responsive to a speed control signal, the portions tiius deleted having lengths responsive to 
said periodicity. 

23. The code-excited linear predictive coder of daim 22, wherein said speed controller (124) also interpolates new 
portions into said input speech signal portions, responsive to said speed control signal, said new portions having 
lengths responsive to said periodicity. 

24. The code-excited linear predictive coder of claim 23. wherein said input speech signal consists of samples, said 
samples are grouped into frames of a fixed number of samples, and said speed controller (124) comprises: 

a buffer memory (126) for temporarily storing a plurality of said frames; 

a periodicity analyzer (128) coupled to said buffer memory (126), for analyzing the periodicity of each frame 
among said frames, and assigning to each said frame a cycle count corresponding to said periodicity; and 

a lengtin adjuster (130) coupled to said periodicity analyzer (128). for deleting from said frame at least one 
block of contiguous samples, equal in number to said cycle count, if said speed control signal designates a speed 
faster than normal speaking speed, and irrterpolating in said frame at least one block of contiguous samples, equal 
in number to said cyde count, if said speed control signal designates a speed slower than normal speaking speed. 

25. The code-exdted linear predictive coder of daim 24. wherein said length adjuster (130) interpolates by repeating 
an existing block of contiguous samples in said frame. 

26. The code-excited linear predictive coder of daim 24, wherein after interpolating, and after deleting, said length 
adjuster (130) regroups said sanrples into new frames having said fixed number of samples each. 

27. An improved code-excited linear predictive decoder of tiie type having an interface circuit (70) for demultiplexing a 
coded speech signal to ot>tain index information and coefficient infornration. an excitation circuit (40) for creating an 
exdtation signal from said index information, and a filtering circuit (90) for filtering said excitation signal according 
to said coeffident information to generate a reproduced speech signal, the improvement comprising: 

a speed controller (132) for detecting periodicity in said exdtation signal, dividing said excitation signal into 
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cycles according to said periodicity, and altering said excitation signal by deleting whole cycles of said excitation 
signal, responsive to a speed control signal. 

28. The code-excited linear predictive decoder of claim 27, wherein said speed controller (132) also interpolates whole 
cycles into said excitation signal, responsive to said speed control signal. 

29. The code-excited linear predictive decoder of claim 28, said speed controller (132) conprises: 

a buffer memory (134) for tenporarily storing at least one segment of said excitation signal, consisting of a 
certain number of sanples; 

a periodicity analyzer (1 36) coupled to said buffer memory (134). for analyzing the periodicity of said segment 
and assigning to said segment a corresp>onding cycle count; and 

a length adjuster (138) coupled to said periodicity analyzer (1 36). for deleting from said segment at least one 
block of contiguous samples, equal in number to said cycle count, if said speed control signal designates a speed 
faster than normal speaking speed, and interpolating into said frame at least one block of contiguous samples, equal 
in number to said cycle count, if said speed control signal designates a speed slower than normal speaking speed. 

30. The code-excited linear predictive coder of claim 29. wherein said length adjuster (138) interpolates by repeating 
an existing block of contiguous samples in said segment 

31. An improved code-excited linear predictive decoder of the type having an interface circuit (70) for demultiplexing a 
coded speech signal to obtain index information and coefficient information, an excitation circuit (40) for creating an 
excitation signal from said index information, and a filtering circuit (90) for filtering said excitation signal according 
to said coefficient information to generate a reproduced speech signal, the improvement comprising: 

a white-noise generator (140) for adding white noise to said reproduced speech signal. 

32. The code-excited linear predictive decoder of claim 31 . wherein said interface circuit (70) also demultiplexes power 
information, and said white noise is generated responsive to said power information. 

33. A method of generating an excitation signal for code-excited linear predictive coding and decoding of an input speech 
signal, comprising the steps of: 

calculating linear predictive coefficients of said input speech signal; 
calculating a power value of said input speech signal; 

selecting an adaptive excitation signal, corresponding to an adaptive index, from an adaptive codebook (1 05) ; 
selecting a stochastic excitation signal from a stochastic codebook (106); 
selecting an impulsive excitation signal from a pulse codebook (107); 

selecting a constant excitation signal by choosing between said stochastic excitation signal and said impulsive 
excitation signal; 

selecting a pair of gain values from a gain codebook (108); 

filtering said constant excitation signal, using filter coefficients derived from said adaptive index and said 
linear predictive coefficients, to convert said constant excitation signal to a varied excitation signal more closely 
resembling said input speech signal; 

combining said varied excitation signal and said adaptive excitation signal according to said power value and 
said pair of gain values to produce a final excitation signal; and 

using said final excitation signal to update said adaptive codebook (105). 

34. The method of claim 33, wherein calculating said linear predictive coefficients comprises the further steps of: 

calculating tine-spectrum-pair coefficients of said input speech signal; 

quantizing said line-spectrum-pair coefficients to obtain coefficient information; 

dequantizing said coefficient information to obtain dequantized line-spectrum-pair coefficients; and 

converting said dequantized line-spectrum-pair coefficients to said linear predictive coefficients. 

35. The method of claim 33, wherein said adaptive codebook (105) stores candidate waveforms comprising past seg- 
ments Qf said final excitation signal, said adaptive index denoting respective starting points of said segments. 

36. The method of claim 33. wherein said pulse codebook (1 07) stores impulsive waveforms, each consisting of a single 
isolated impulse. 

37. The metiiod of claim 36 wherein, when said impulsive excitation signal is selected as said constant excitation signal, 
said conversion filter (109) produces a varied excitation signal consisting of pulse clusters wHh a shape responsive 
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to said linear predictive coefficients, repeated at intervals determined by said adaptive index, starting from a position 
determined by said pulse index. 

38, The method of claim 33, wherein said stochastic codebook (106) and said pulse codebook (107) are combined into 
a single fixed codebook (1 50) storing both stochastic excitation signals and impulsive excitation signals, from among 
whidi said constant excitation signal is selected directiy. 

39- The method of claim 33. comprising the further step of converting said adaptive index to a fixed value, responsive 
to a control signal designating monotone speech. 

40. The method of claim 33, comprising the further steps of: 

analyzing periodicity of said irput speech signal to determine a cycle length of said input speech signal; and 
deleting portions of said input speech signal, having lengths: equal to said cycle length, responsive to a speed 
control signal. 

41- The method of claim 40. comprising the furtiier step of interpolating new portions into said input speech signal, 
responsive to said speed control signal, said new portions having lengths equal to said cycle length. 

42. The method of claim 33, comprising the further stqDS of: 

analyzing periodicity of said final excitation signal to determine a cycle length of said final excitation signal; 

and 

deleting portions of said final excitation signal, having lengths equal to said cyde length, responsive to a 
speed control signal. 

43. The method of claim 42. comprising the further step of interpolating new portions into said final excitation signal, 
responsive to said speed control signal, said new portions having lengths equal to said cyde length. 

44. A method of decoding a coded speech signal, comprising the steps of: 

demultiplexing said coded speech signal to obtain power information, coeffident information, an adaptive 
index, a constant index, a selection index, and a gain index; 

dequantazing said power information to obtain a power value; 

dequantizing said coefficient information to obtain linear predictive coefficients; 

selecting an adaptive excitation signal from an adaptive codebook (105). responsive to said adaptive index; 

selecting a stochastic excitation signal from a stochastic codekx)ok (1 06), responsive to said stochastic index; 

selecting an impulsive excitation signal from a pulse codebook (107). responsive to said pulse index; 

selecting a constant excitation signal by choosing between said stochastic e^o^itation signal and said impulsive 
exdtation signal, responsive to said selection index; 

selecting a pair of gain values from a gain codebook (108), responsive to said gain index; 

filtering said constant excitation signal, using filter coeffidents derived from said adaptive index and said 
linear predictive coefficients, to convert said constant exdtation signal to a varied exdtation signal; 

combining said varied excitation signal and said adaptive excitation signal according to said power value and 
said pair of gain values to produce a final exdtation signal; 

using said final exdtation signal to update said adaptive codebook (105); 

filtering said final excitation with said linear predictive coeffidents to generate a reproduced speech signal; 
generating a white-noise signal; and 

adding said white-noise signal to said reproduced speech signal to generate an output speech signal. 

45. The method of claim 44, wherein dequantizing said coefficient information comprises: 

obtaining line-spectrum-pair coefficients from said coefficient information; and 
converting said line-specto-um-pair coeffidents to said linear predictive coeffident. 

46- The metiiod of claim 44. wherein said stochastic codebook (1 06) and said pulse codebook (1 07) are combined into 
a single fixed codebook (1 50) storing botfi stiDchastic exdtation signals and impulsive excitation signals, from among 
which said constant excitation signal is selected. 
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(57) A code-excrted linear predictive coder or 
decoder for a speech signal has an adaptive codebook 
(105). a stochastic codetx)ok (106). and a pulse code- 
book (107). A constant excitation signal (ec) is obtained 
by choosing between a stochastic excitation signal (es) 
selected from the stochastic codebook and an inpulsive 
excitation signal (ep) selected from the pulse codebook. 
The constant excitation signal is filtered to produce a 
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varied exdtation signal more closely resembling the 
original speech signal. The varied excitation signal is 
combined with an adaptive excitation signal (ea) 
selected from the adaptive codebook to produce a final 
excitation signal (e) which is filtered to generate a syn- 
thesized speech signal. The final excitation signal (e) is 
also used to update the adaptive codebook. 
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